Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds ability to Trace "check" function with DD APM #2079

Merged
merged 21 commits into from
Sep 25, 2018
Merged

Conversation

nmuesch
Copy link
Collaborator

@nmuesch nmuesch commented Aug 20, 2018

What does this PR do?

Adds in the ddtrace lib as a static dependency
Adds ability to enable/disable tracing at a per instance level using a new decorator in datadog_checks_base, and enabled auto patching of requests

Only available with Agent 6 and requires two configuration options to be enabled.
One config flag in datadog.yaml: integration_tracing. With this enabled, requests is auto patched via the ddtrace library globally. With this disabled, no integrations will be traced, and no libraries will be auto patched.
A per instance config for each integration: trace_check - With this enabled, this specific integration (for those that provide support) will have their check function traced. Note that this is disregarded if integration_tracing is disabled globally (as is by default)

Add first example of tracing in OpenStack for the check function.

Motivation

Want the ability to see function performance for the more complicated/scaled integrations
Automatically patch the requests library behind a datadog.yaml config flag.

Ex: https://cl.ly/86a53f2ed3a9

Review checklist

  • PR has a meaningful title or PR has the no-changelog label attached
  • Feature or bugfix has tests
  • Git history is clean
  • If PR impacts documentation, docs team has been notified or an issue has been opened on the documentation repo

Additional Notes

This is currently enabled/disabled at a per instance level using trace_check in the integration configuration file. Doing this means there are two assumptions about the method being wrapped:

  • The method is a class method, so we can use the class name as the Service Name The Service name will be static across all integrations, and the resource name is the name of the function being bassed, and the span name is static/is always integration check
  • The first passed param is the instance so we can check if tracing is enabled/disabled for this check. EDIT: now we just check that instance is any of the params passed to the function using the inspect module, so we gain a little more flexibility here.

This approach was chosen because it allows us to easily add other libraries that ddtrace may have auto patching support for by simply adding it to the init.py file, while keeping the Agent's performance the same by default if the global tracing flag is disabled.

@nmuesch nmuesch requested a review from a team as a code owner August 20, 2018 15:46
@nmuesch nmuesch changed the title Adds ability to Trace check function with DD APM Adds ability to Trace "check" function with DD APM Aug 20, 2018
gmmeyer
gmmeyer previously approved these changes Aug 20, 2018
@@ -1176,6 +1176,7 @@ def ensure_auth_scope(self, instance):

return instance_scope

@trace_func
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to add it to all checks without to add this decorator here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see an easy was to do that as, since we don't use a lot of frameworks that the tracing client supports out of the box, we need to do this "manual instrumentation". I think having a single decorator per check isn't a bad trade off here though. I also don't think we necessarily want/need it for all checks right now 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! That's fair. Thanks

from datadog_checks.config import is_affirmative
from ddtrace import tracer, patch

patch(requests=True)
Copy link
Contributor

@ofek ofek Aug 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know import order matters for ddtrace currently, but we need to make this conditional too somehow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, let me think about that one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it appears that the parent span has to be started in order for the underlying request to be patched and have an associated span be sent. Disabling the flag with the current state of the code submits 0 traces from the check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this to the __init__.py of the base package, it should be early enough.

gzussa
gzussa previously approved these changes Aug 21, 2018
from datadog_checks.config import is_affirmative
from ddtrace import tracer, patch

patch(requests=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this to the __init__.py of the base package, it should be early enough.

def trace_func(func):
def function_wrapper(*args, **kwargs):
if is_affirmative(args[1].get('trace_check', False)):
with tracer.trace(func.__name__, service=args[0].__class__.__name__):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This IMO should be:

tracer.trace('integration.check', service='IntegrationsTracing', resource=self.name)

This would allow us to have a main entry in the trace UI showing IntegrationsTracing, then the list of traces broken down by check.name. Each one would show (for now, or by default...) one traced function, integration.check.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning for making the Service name be the name of the class is so that we aren't locked into only decorating the check function.

If we make the service name static then the set of resources under that are going to be difficult to sift through if we add the decorator to a couple functions in a check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how the two things are connected - you can have one big service and a bunch of resources grouped by it. Each resource would have one or more spans (likely one per function?) attached to it.

Let's not assume one trace == one function

if is_affirmative(args[1].get('trace_check', False)):
with tracer.trace(func.__name__, service=args[0].__class__.__name__):
return func(*args, **kwargs)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this else

patch(requests=True)

def trace_func(func):
def function_wrapper(*args, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's decorate this with @functools.wraps(func)


def trace_func(func):
def function_wrapper(*args, **kwargs):
if is_affirmative(args[1].get('trace_check', False)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the inspect module to get any instance argument.

@nmuesch nmuesch dismissed stale reviews from gzussa and gmmeyer via 3542d64 August 30, 2018 21:30
@@ -5,6 +5,7 @@ boto==2.46.1
cffi==1.11.5
cryptography==2.3
cx_oracle==6.0.1
ddtrace==0.12.1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can upgrade it to https://github.com/DataDog/dd-trace-py/releases/tag/v0.13.0 because 0.12.1 had some instability issues.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for the headsup

except ValueError:
return func(*args, **kwargs)
if is_affirmative(args[instance_index].get('trace_check', False)):
with tracer.trace('integration.check', service='IntegrationsTracing', resource=args[0].name):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args[0].name is the name of the check, right? About the service name, you can use integrations-tracing just because usually we use that naming convention.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, args[0] is just the name of the check here, in this case it comes in the UI as openstack. I updated the service name to match that convention ^

@@ -1,3 +1,6 @@
# (C) Datadog, Inc. 2018
# All rights reserved
# Licensed under a 3-clause BSD style license (see LICENSE)
from ddtrace import patch

patch(requests=True)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the Python interpreter is invoked from the Agent, do we know what integrations will be loaded? We may try to find a better place to patch enabled checks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@masci Is this something you are familiar with? I can look into digging up the answer if not.

Do we know what the overhead of enabling this early on for all checks would be?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this PR up a bit to put the patching of any libraries behind the datadog.yaml config flag. I think this is OK because its disabled by default, and only really expected to be enabled if an issue is being investigated so not for an extended period of time.

@@ -1,3 +1,12 @@
# (C) Datadog, Inc. 2018
# All rights reserved
# Licensed under a 3-clause BSD style license (see LICENSE)

from ddtrace import patch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put this in the if

# All rights reserved
# Licensed under Simplified BSD License (see LICENSE)

from datadog_checks.config import is_affirmative
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from ..config import is_affirmative

instance_index = getargspec(func).args.index('instance')
except ValueError:
return func(*args, **kwargs)
if is_affirmative(args[instance_index].get('trace_check', False)) and datadog_agent.get_config('integration_tracing', False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's

if (
    cond1
    and cond2
):
    ...

datadog_checks_base/datadog_checks/utils/decorator.py Outdated Show resolved Hide resolved
# Integration Tracing is only available with Agent 6
pass

def trace_func(func):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename to trace, tracer, or add_tracing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want it to name it the same thing as an exposed function from ddtrace? I chose trace_func at the time since its just a decorator that performs a standard manual instrumentation step for any function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps tracing or add_tracing then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with add_tracing

@nmuesch
Copy link
Collaborator Author

nmuesch commented Sep 20, 2018

@ofek I've thought about a way to test this and I'm not sure there is a clean way, please let me know if you have a way in mind 🙂

The issues I was having is that the datadog_agent import is only available if you have a running Agent6, and since the util gets imported earlier on, we'd have to mock datadog_agent very early on, or it gets set to None immediately. I don't think mocking that early for this test is worth it here.

I also don't think we'd be able to inspect the list of traces from the tracer to see if we got an expected trace from our decorator, and we should be able to assume the tracing client works?

Let me know if you have any feedback on that!

@codecov-io
Copy link

codecov-io commented Sep 20, 2018

Codecov Report

Merging #2079 into master will decrease coverage by 26.25%.
The diff coverage is 0%.

@@             Coverage Diff             @@
##           master    #2079       +/-   ##
===========================================
- Coverage      85%   58.74%   -26.26%     
===========================================
  Files         182       42      -140     
  Lines       12011     2846     -9165     
  Branches     1192      461      -731     
===========================================
- Hits        10210     1672     -8538     
+ Misses       1425     1069      -356     
+ Partials      376      105      -271

ofek
ofek previously approved these changes Sep 20, 2018
Copy link
Contributor

@ofek ofek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry about testing for now. Let's confirm it works in Python 3 soon!

Also, I'm making changelog Changed for new dep.

@@ -0,0 +1,29 @@
# (C) Datadog, Inc. 2018
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice the name of the module before, sorry for the late comment but I think this should be tracing.py or something like that - decorator.py doesn't tell much about the role of this module in the package

if datadog_agent is None:
return wrapped(*args, **kwargs)

trace_check = any(t.get('trace_check') for t in args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.get will raise if one of the args is not a dict-like object.

I think we need a better way to configure tracing, inspecting the args of the decorated function is not flexible and feels like looking for trouble.

My proposal

  1. Move the settings from instance to init_config so tracing will be enabled per check and not per instance

  2. Build a register tracking the checks that are configured to send traces and store it at the module-level so that it'll behave like a singleton at runtime. Also provide accessors so that a check instance can add itself to the registry.

__tracing_config = set()

def trace_check(check_object):
    __tracing_config.add(check_object)

def traced(wrapped, instance, args, kwargs):
    ...
    trace_check = instance in __tracing_config
    ...

the check itself should enable tracing in its __init__:

from datadog_checks.utils.decorator import trace_check

...

def __init__(self, name, init_config, agentConfig, instances=None):
    if init_config.get('trace_check', False):
        trace_check(self)

masci
masci previously approved these changes Sep 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants